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Abstract 



We study graph orientations that minimize the entropy of the in-degree sequence. We 
prove that the minimum entropy orientation problem is NP-hard even if the graph is 
planar, and that there exists a simple linear-time algorithm that returns an approximate 
solution with an additive error guarantee of 1 bit. 
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1 Introduction 

All graphs considered here are finite, undirected and loopless, but multiple edges are allowed. 
Let G = {V, E) be a graph with n vertices and m edges, and consider any directed graph G 
obtained by orienting the edges of G. The in-degree distribution p G of this orientation is 
defined by := p^{v)/m, where p^{v) denotes the in-degree of v in G. 

In this paper, we consider the problem of finding an orientation whose corresponding 
in-degree distribution is as unbalanced as possible. As a balance measure, we use the entropy 



where log denotes the base 2 logarithm, and — OlogO := 0. The minimum entropy orientation 
problem (MINEO) is the problem of finding an orientation of G with an in-degree distribution 
p minimizing H{p). 

The study of MINEO is motivated by that of the minimum entropy set cover problem 
(MINESC), introduced by Halperin and Karp [8]. In the latter problem, we are given a 
ground set U and a collection S = {Si, . . . , Sq} of subsets of U whose union is [/, and we 
have to assign each element of C/ to a subset Si containing it. This assignment partitions 
U into classes C/i, U2-, ■ ■ ■ ,Uq of elements assigned to the same subset, and the objective is 
to minimize the entropy of the probability distribution defined by this partition. Hence, 
MINEO is the special case of MINESC where each element of the ground set can be covered 
by exactly two sets from S. (To see this, take U := E and let S := {Sy \ v G V} with 
Sv ■= {e £ E \ e is incident to v}.) 
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Figure 1: Construction of tight examples for the greedy algorithm. Given a value t (in our 
illustration, t = 4), let S* be a set of t! independent vertices (the red vertices). Then for each i 
in {1, 2, . . . , t}, construct tl/i independent vertices of degree i with neighbors in S, such that 
their neighborhoods partition S into subsets of size i. The optimal solution is obtained by 
orienting each edge toward its endpoint in S. The greedy algorithm may orient each edge in 
the opposite direction. This example is equivalent to the tight example previously given for 
the minimum entropy set cover problem [3]. 

The main motivation of Halperin and Karp for introducing MINESC was to solve a hap- 
lotyping problem, of importance in computational biology. This problem involves covering 
a set U of partial haplotypes of length d, defined as words in the set {0, 1, *}'^, by complete 
haplotypes, defined as words in {0, 1}'^. Each subset of S corresponds to a complete hap- 
lotype h E {0, l}'^ and contains all partial haplotypes that are compatible with /i, that is, 
partial haplotypes whose symbols match in every non-'*' positions with h. The '*' positions 
in the partial haplotypes are interpreted as measurements error. It is shown that, under some 
probabilistic assumptions, minimizing the entropy of the covering amounts to maximizing its 
likelihood [8]. 

An application of MINEO is the special case of partial haplotyping in which each partial 
haplotype has at most one '*' in it. Partial haplotypes are then edges of a d-dimensional 
hypercube, and the subsets Si correspond to vertices of this hypercube. In other words, this 
special case of the partial haplotyping problem is a minimum entropy orientation problem in 
a partial hypercube. 

The well-known greedy algorithm for the set cover problem is applicable to MINEO. 
It involves iterating the following steps: choose a maximum degree vertex v in G, orient 
all edges incident to v toward v, and remove v from G. The performance of the greedy 
algorithm for MINESC has been studied thoroughly. Halperin and Karp [H] first showed 
that the greedy algorithm approximates MINESC to within some additive constant. Then 
the current authors improved their analysis and showed that the greedy algorithm returns a 
solution whose entropy is at most the optimum plus loge bits, with loge ~ 1.4427 bits [3]. 
Moreover, they proved that it is NP-hard to approximate MINESC to within an additive 
error of loge — e, for every constant e > 0. Since MINEO is a special case of MINESC, the 
first result implies that the greedy algorithm also approximates MINEO to within an additive 
error of loge bits. We note that there exist instances of MINEO where the latter bound is 
(asymptotically) attained; see for example those described in Figure [TJ 

In this paper, we first prove that MINEO is NP-hard, even if the input graph is planar 
(Section [2]). The reduction is from a restricted version of the l-in-3 Satisfiability problem. 
Then, we show in Section [3] that there exists a simple linear-time approximation algorithm 
for MINEO with an improved approximation guarantee of 1 bit. 

To conclude the introduction, we mention that MINEO is also related to the minimum 
sum vertex cover problem (MINSVC), introduced by Feige, Lovasz, and Tetali j5]. In that 
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problem, we are given a graph G, and have to find an ordering vi,V2, ■ ■ ■ ,Vn of its vertices 
such that the average cover time of an edge is minimized, that is, minimizing 

n 
i=l 

where f{vi) denotes the set of edges that are incident to Vi but to no vertex Vj with j < i. 
This problem can also be seen as a graph orientation problem in which the most unbalanced 
orientation is sought, although with a different balance measure than that of MINEO. Feige 
et al. [S' proved that MINSVC is APX-hard, and gave a 2-approximation algorithm, based on 
randomized rounding of a natural linear programming relaxation. Using a different rounding 
technique, Berenholz, Feige, and Peleg [1] recently derived an improved approximation factor 
of 1.99995. 

Although MINEO and MINSVC share some common properties, the main difference be- 
tween these two problems is perhaps how they behave with respect to instances that are the 
union of smaller ones: As noted by Berenholz et al. [1], MINSVC is not "linear", in the sense 
that an optimal solution to the union of two disjoint graphs Gi and G2 is not necessarily 
a combination of an optimal solution to Gi and G2, respectively. (In particular, the APX- 
hardness proof of Feige et al. ^ relies on this non-linearity.) On the other hand, MINEO is 
linear, as can be easily checked. 

2 Hardness 

Let a = (oi, 02, . . . , a„) and h = (61, 62, • • • > ^n) be two sequences of non-negative integers 
sorted in non-increasing order, and such that Ym=i (^i — X^ILi ^« ~- ^^^^ sequence 

a dominates sequence b if 

i i 

Y.a,>Y,hj (1) 

i=i j=i 

for every i G { 1 , . . . , n} , and moreover ([T]) holds with strict inequality for at least one such 
i. We emphasize that a ^ b whenever a dominates b. The following lemma is a standard 
consequence of the strict concavity of the function x 1— > —x log x; see e.g. p] El [9] for different 
proofs. 

Lemma 1. If a dominates b, then H{a/m) < H{b/m). 

Theorem 1. Finding a minimum entropy orientation of a planar graph is NP-hard. 

Proof. In the l-in-3 Satisfiability problem, we are given a 3-SAT formula in input, and we 
have to decide whether there exists a truth assignment of the variables such that each clause is 
satisfied by exactly one of its three literals. Moore and Robson [12^ proved that this problem 
is NP-complete, even if every variable appears in exactly three clauses, there is no negation in 
the formula, and the bipartite graph obtained by linking a variable and a clause if and only 
if the variable appears in the clause, is planar. 

We will reduce the latter restriction of the l-in-3 Satisfiability problem to MINEO. It will 
be convenient for the proof to restate Moore and Robson's result in the context of the Exact 
Cover problem. The latter asks, given a set system {U,S), to decide if U can be covered 
using pairwise disjoint sets from S. As noted by Li and Toulouse [11], the NP-completeness 
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Figure 2: Gadget for element Uj. 

of the version of l-in-3 Satisfiability described above directly implies that Exact Cover is NP- 
complete even when every set in S has cardinality exactly 3, each element in U is included 
in exactly three sets of S, and the "elements versus sets" incidence graph is planar. (To see 
this, consider the set system (U, S) where U and S are associated with the clauses and the 
variables of the l-in-3 Satisfiability instance, respectively.) Let {U, S) be any such set system, 
and denote hy ui, ... ,Uq and Si, . . . ,Sq the elements of U and the sets in 5, respectively. We 
may assume without loss of generality that g is a multiple of 3, since otherwise {U, S) has no 
exact cover. 

We construct a graph G = (y,E) as follows: First, create a vertex Sj per set Si. Then, for 
each element uj, add a copy of the gadget depicted in Figure [2j and link Uj^^ (1 < < 3) to 
Sjj^, where ji,j2i J3 are the indices of the three sets containing Uj. The fact that G is planar 
directly follows from the planarity of the bipartite graph underlying the set system (U,S). 

Let G be any orientation of G with minimum entropy, and denote by A its arc set. For a 
subset X C y of vertices, we use 5{X) for the number of arcs going from X to V — X in G. 
We define the in- degree sequence of X, denoted in-seq(X), as the sequence of in-degrees of 
the vertices in X, sorted in non-increasing order. Let also Xj := {uj^i,Uj^2,Uj,3, Vj^i,Vj^2, Vj^^}, 
for every j G {1, . . . , q}. 

Claim 1. The in-degree sequence of the set Xj in G is given by the following table: 





in-seq{Xj) 





(4,3,3,1,1,0) 


1 


(4,3,3,1,0,0) 


2 


(4,3,2,1,0,0) 


3 


(3,3,2,1,0,0) 



Proof. Denote by (ci, C2, . . . , ce) the in-degree sequence of the set Xj in G. Arguing by 
contradiction, we suppose that this sequence is different from the one given in the table. 
Considering the structure of the element- gadget, and in particular its two disjoint triangles, 
we infer the following bounds: 

,^<f4 if^(X,)<3; 
1^ 3 otherwise; 

C2 < 3; C3 < 3; C4 > 1; 
C5 > 1 if 6{Xj) = 0. 
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Figure 3: A good orientation of the element-gadget. It is assumed that the arcs going out of 
Xj occur clockwise from the top of the figure. 

It follows from the inequalities above and Yl^(.=i Q = 12 — 5{Xj) that in-seq(Xj) is dominated 
(in the sense of Lemma [T]) by the corresponding sequence in the table. Now, it is always 
possible to re-orient the arcs of G with both endpoints in Xj in such a way that in-seq(Xj) 
realizes the latter sequence, as illustrated in Figure [3} Because this modification leaves the 
in-degrees of the vertices in V — Xj unchanged, we deduce from Lemma [T] that the new 
orientation has an entropy strictly smaller than G, a contradiction. □ 

Using Claim [l| we may assume without loss of generality that is isomorphic to the 

orientation of the element-gadget given in Figure |3j Renaming the vertices if necessary, we 
may thus suppose that Uj^i, Uj^2, Uj,3 have in-degree 0, 2, 3 in respectively. 

Consider now the three edges Uj^iSj^,Uj^2Sj2-,Uj,'iSj^ of G. Recall that ji, ^2,^3 denote the 
indices of the three sets in S containing the element Uj. Because G has minimum entropy, 
we may assume that the first of these three edges is oriented out of Xj and the last two 
toward Xj. Indeed, if we have {uj^'i,Sj^) G A then p^[sj^) < 3, Pf^iuj^^) = 3, and changing 
the orientation of Uj^sSjg would decrease the entropy of G by Lemma [l| a contradiction. 
Moreover, if (uj,2, Sj^) ^ ^) then re-orienting Mj,2Sj2 either leaves the entropy of G unchanged 
or decreases it. A similar argument holds when (sji,nj_i) G A. 

It follows that there is exactly one arc going out of Xj in G, for every j £ {1, . . . ,q}. We 
proceed with a second (and last) claim. 

Claim 2. Let m := \ A\ ( = 12q). Then the entropy of G is at least 

— (4glog(m/4) -|- Tqlogim/S) + qlogm) , 
m 

with equality if and only if there exists an exact cover of {U,S). 

Proof. As we have seen, we have without loss of generality in-seq(Xj) = (4,3,3,1,0,0) for 
every j £ {!,..., g}. Then, each component of in-seq(S'), where S := {si, S2, ■ ■ ■ , Sq}, is 
clearly at most 3, and the sum of all of them equals q. Combining these observations with 
Lemma [l| we deduce that the entropy of G is at least the lower bound given in the claim, 
and that equality occurs if and only if in-seq(/S') equals (3, 3, . . . , 3, 0, 0, . . . , 0). We show that 
the latter happens if and only if there exists an exact cover of {U, S). 
Suppose first in-seq(S') = (3, 3, . . . , 3, 0, 0, . . . , 0), and define 5* C 5 as 

S* := {Si : p^{si) > 0}. 
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It is then easily seen that the collection S* is an exact cover for the set system {U,S). 

Now assume that S C 5 is an exact cover of {U,S). After permuting the indices ji, j2 
and ja, we can assume that the set of S containing uj is Orienting each edge SiUj^k of 
G toward Sj if A; = 1, toward Uj^k otherwise, and using the orientation of the element-gadgets 
given in Figure [sj we obtain an orientation G* of G where each Xj has in-degree sequence 
(4, 3, 3, 1, 0, 0) and S has in-degree sequence (3, 3, . . . , 3, 0, 0, . . . , 0). Hence, S must also have 
the same in-degree sequence in G, since otherwise G* would have entropy strictly less than 
G, contradicting the optimality of the latter orientation. The claim follows. □ 

By Claim [2] a polynomial-time algorithm finding a minimum entropy orientation of G 
could be used to decide, in polynomial time, if there exists an exact cover of {U,S). This 
completes the proof of the theorem. □ 

Let us say that a graph orientation problem has the strict dominance property if the 
objective function F to minimize is such that 

F{G) < F{G') 

whenever G and G' are two orientations of a graph G = (V,E) such that the in-degree 
sequence of y in G dominates that of V in G'. Hence, Lemma [l] says exactly that MINEO 
has the strict dominance property. We remark that, since the proof of Theorem [T] relies 
solely on that lemma, it follows more generally that every orientation problem with the strict 
dominance property is NP-hard on planar graphs. 

3 Approximation 

Throughout this section, we denote by OPT(G) the minimum entropy of an orientation of G. 
An orientation of G is biased if each edge vw with deg(w) > deg{w) is oriented toward v. It 
turns out that biased orientations have entropy close to the minimum achievable: 

Theorem 2. The entropy of any biased orientation of G is at most OPT{G) + 1. 

Since finding a biased orientation can easily be done in linear time. Theorem [2] yields the 
following corollary: 

Corollary 1. MINEO can be approximated within an additive error of 1 bit, in linear time. 

Let m denote the number of edges of G and := deg(f ) / (2m) the normalized degree of a 
vertex v. Given two discrete probability distributions p and q over a common domain X, we 
denote by D{p \\ q) their relative entropy (or KuUback-Leibler distance), defined as follows: 

D{p II q) := ^Pilog—. 

tax 

It is known that D{p \\ q) is always non-negative (see for instance Cover and Thomas [H 
Section 2.6] for a proof). 
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Proof of Theorem^ Let G be an optimal orientation of G. Denote respectively by p and A 
the in-degree distribution and arc set of G. We first rewrite the entropy of p as follows: 



OPT(G) = ^-p„-logp. 



^-^ m m 

= log m ^ pg{v) ■ log pg{v) 

= Yl logp^(^;). 

Now we observe that for any vertex v we have Pq{v) < deg(t'). We let & be a biased orien- 
tation, its arc set, and p^ the corresponding in-degree distribution. From our observation, 
we have 



OPT(G) > logm logmax{deg(n), deg(f )} 

{u,v)&A 

= log m log degfw) (because G^ is biased) 

m ^-^ 

= log m - — ^ p^i, (v) ■ log deg(v) 



m 



b 1 deg(u) 

> -p^ ■ log 

^-^ m 

\v£V / 



Hip") + Dip" \\d)-l 



Since D(p^ \\ d) > 0, we have H(p^) < OPT(G) -|- 1, which concludes the proof. □ 

We note that the bound given in Theorem [2] is tight: consider for instance the case where 
G is a cycle. 

We end this section with some remarks. For S C.V, let e(5) denote the fraction of edges 
of G incident to a vertex in S. Thus e(y) = 1. It is well-known that e is a submodular 
function, that is, satisfies e{X) + e(Y) > e{X 'r\Y) + e{X U Y) for all X,Y (IV. We denote 
the base polytope of e by P{G). Letting p{S) := Ylivi^sP"" S C y, we thus have 

P{G) = {^j G : p{S) < e{S) V5 C V, p{V) = 1}. 
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It follows from standard results on polymatroids that P{G) is the convex hull of all the in- 
degree distributions of orientations of G; see for instance Schrijver [H]. The vertices of P{G) 
correspond to the acyclic orientations of G. 

Consider now the following generic linear program, where for each v ^ V , Cy is a, fixed 
non-negative cost: 

min ^ 

iiev (2) 
s.t. pGP(G). 

This linear program can be solved by the following greedy algorithm: First order the vertices 
in V in non-decreasing order of costs, say vi,V2, ■ ■ ■ ,Vn- Then, start with the null vector 
p := (0, 0, . . . , 0), and, for each i = 1, . . . , n, increase the ith component of p as much as 
possible, ensuring that p{S) < e{S) remains true at all time, for every 5 C y. It is well- 
known (see e.g. [U]) that the resulting point p belongs to P{G), that is, it satisfies also 
p(y) = 1, and furthermore that p is an optimal solution to the above linear program. 
Let us set c„ := — log(deg(t')/m). Thus, we obtain the following linear program: 

deg{v) 

mm 



^ -pv ■ log 

s.t. peP{G). 



m (3) 



Since MINEO can be formulated as 

min ^ -p^ ■ logpy 

v&v (4) 
s.t. peP{G), 

and —pv ■ logp„ > —py ■ log(deg(w)/m) trivially holds for every p G P{G), the optimum value 
of ([3]) gives a lower bound on OPT(G). Now, observe that performing the greedy algorithm 
to solve ([3]) amounts to finding a biased orientation of G. Moreover, the in-degree sequence 
of every such orientation can be produced by the algorithm. 

To conclude, we mention that the natural counterpart of MINEO where one aims at 
finding an orientation of G with maximum entropy is polynomial. This is because maximizing 
a separable concave function over P{G) n can be done in polynomial time; see [71 [TOl US] . 
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